For the making of this report, it was given a dataset with data of pacients of the Hospital Austral who had suffered from heart diseases and where treated there with a surgery or an angioplasty. So, the goal of this report is trying to predict the neccesary intervention a future pacient may need.
In this section, we are going to analyze the different factors and variables that may be used to take into account in the predictor.
Given the dataset, we thought it was imperative to start analyzing the percentage of men and women present in the registry. Even though, the difference of sex do not tend to be a factor primarily to take into account in the predictor, it could be good to show the difference and see whether we have to use it.
On the other hand, it could be relevant to check and analyze the difference of age between men and women. With this density graph, it is easy to contrast the age and each sex.
We found very useful to see the amount of pacients with risks factors regarding the age. Even though, age usually do not tend to be fundamental to leading to cardiac issues, unfortuantely it rises the probabability of getting other factors, such as COPD, obesity, daibetes or dialysis.
This graph contains relevant information, to start analyizing the dataset with what we are going to predict: which proccedure a pacient with cardaic problems must go to.
With this easy-reading pie chart, we can see how more often angioplaties are. However, surgeries are also very common as we can see.
With these two graphs we see how important is the reason of admission and see how the majority went with a programmed date.
We interpretate the dataser with graphs using multiple crossed variables.
Using two venn diagrams, we can easily check patiens with multiple risk factors, for example, six male patients suffer from obesity and diabetes. These type of graphs are very useful to see the preccedure chosen for each pacient and how multiple factors may affect the decision.
Here is where things start to get interesting. We check how important is age as a variable. How much will we take it into account once we make the predictor.
Just like the previous graph, we see how important a variable is. In this case, the risk factors. We can see how the majority of pacients with diabetes go to angioplasty. However, the majority of obese pacient go to surgery. This is important information to take into account for the predictor
Once we analyze and interpretate the given dataset,we selected the variables the we are going to use for the model. And these are: Age, risk factor and number of injuries.Then, we use 70% of the data to train our model, and then with the remaining 30% we use it to test how effective actually is.
So, with the test partition, we show the results with a confussion matrix, where we contrast the predicted values and the actual values.| Angioplasty | Surgery | |
|---|---|---|
| Angioplasty | 68 | 2 |
| Surgery | 7 | 27 |
The accuracy is 91.35%
Columns show what the model predicted and rows show the real data.
Finally, with an AUROC we compare the true positive rate and the false positive rate. Here, we can confirm there is a low ratio of false positives.